Speaker and noise independent voice activity detection
نویسندگان
چکیده
Voice activity detection (VAD) in the presence of heavy, non-stationary noise is a challenging problem that has attracted attention in recent years. Most modern VAD systems require training on highly specialized data: either labeled mixtures of speech and noise that are matched to the application, or, at the very least, noise data similar to that encountered in the application. Because obtaining labeled data can be a laborious task in practical applications, it is desirable for a voice activity detector to be able to perform well in the presence of any type of noise without the need for matched training data. In this paper, we propose a VAD method based on non-negative matrix factorization. We train a universal speech model from a corpus of clean speech but do not train a noise model. Rather, the universal speech model is sufficient to detect the presence of speech in noisy signals. Our experimental results show that our technique is robust to a variety of non-stationary noises mixed at a wide range of signal-to-noise ratios and outperforms a baseline.
منابع مشابه
A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملSpeaker-Dependent Voice Activity Detection Robust to Background Speech Noise
In this paper, we proposed a speaker-dependent voice activity detection (VAD) algorithm that only extracts the speech period uttered by a target user. Based on our survey of the recognition error of real speech data collected in “VoiceTra,” which is a speech-to-speech translation system for smartphones, we found that many word insertion errors are caused by background speakers’ speech. Our VAD,...
متن کاملRobust voice activity detection for narrow-bandwidth speaker verification under adverse environments
We describe a voice activity detection algorithm which leads to significant improvement of a narrow-bandwidth speaker verification system under harsh environments. This algorithm is based on a time-scale feature which is extracted from wavelet subbands. A statistical quantile filtering technique is proposed to estimate an adaptive noise threshold. A hang-over scheme is then applied to bridge sh...
متن کاملMUSAN: A Music, Speech, and Noise Corpus
This report introduces a new corpus of music, speech, and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. Our corpus is released under a flexible Creative Commons license. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises. We demonstr...
متن کاملThe QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition
The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of background noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition datasets such as Switchboard, Mixer and the speaker recognition evaluation (SRE) datasets provided by NIST. By allowing common, clean, speech corpora to be mixed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013